Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set UCX_CUDA_COPY_MAX_REG_RATIO=1.0 when GPUs permit #824

Merged

Conversation

pentschev
Copy link
Member

This is necessary to maintain good performance on GPUs with large BAR1 size (greater or equal to the total GPU memory size). Starting with UCX 1.12 this value is set to 0.1 by default to ensure registration doesn't fail with GPUs that have a BAR1 size lower than the total GPU memory size (e.g., T4).

Currently we're setting to 1.0 when all GPUs have a large BAR1, in the future it may be useful to compute the real BAR1/TotalGPUMemory ratio to set this number. This is also a global setting, for fine-grained settings the application should take care of changing this value appropriately as UCX-Py doesn't know which GPUs are going to be used by the application at import time.

Additionally, setup logger before environment variables are set and log what's being set.

This is necessary to maintain good performance on GPUs with large BAR1
size (greater or equal to the total GPU memory size). Starting with UCX
1.12 this value is set to 0.1 by default to ensure registration doesn't
fail with GPUs that have a BAR1 size lower than the total GPU memory
size (e.g., T4).

Currently we're setting to 1.0 when all GPUs have a large BAR1, in the
future it may be useful to compute the real BAR1/TotalGPUMemory ratio to
set this number. This is also a global setting, for fine-grained
settings the application should take care of changing this value
appropriately as UCX-Py doesn't know which GPUs are going to be used by
the application at import time.
@pentschev pentschev requested a review from a team as a code owner January 3, 2022 19:52
@quasiben
Copy link
Member

quasiben commented Jan 3, 2022

Thank you tracking this down @pentschev !

@pentschev pentschev changed the title Set UCX_CUDA_COPY_MAX_REG_RATIO=1.0 when GPU permits Set UCX_CUDA_COPY_MAX_REG_RATIO=1.0 when GPUs permit Jan 3, 2022
@pentschev
Copy link
Member Author

rerun tests

@quasiben
Copy link
Member

quasiben commented Jan 4, 2022

Tested and confirmed this fixed a performance issue -- merging in.

Thank you again @pentschev !

@quasiben quasiben merged commit 55d5724 into rapidsai:branch-0.24 Jan 4, 2022
@pentschev pentschev deleted the ucx-cuda-copy-max-reg-ratio branch January 7, 2022 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants